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Abstract— Academic performance prediction is an indispensable task for policymakers. Academic 
performance is frequently examined using classical statistical software, which can be used to detect 
logical connections between socioeconomic status and academic performance. These connections, 
whose accuracy depends on determine prediction accuracy. To eliminate the effects of logical 
relationships on such accuracy, machine learning models extended with education and 
socioeconomic data to predict academic performance. The decision tree, random forest, logistic 
regression, support vector machine, and neural network are used for testing. The neural network 
model can be used by policymakers to forecast academic performance, which in turn can aid in the 
formulation of various policies, such as those regarding funding and teacher selection. Finally, this 
study demonstrated the feasibility of machine learning as an auxiliary educational decision-making 
tool for use in the future. 
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I. INTRODUCTION 

Educational Systems have widely utilized standardized examinations as a large-scale means of 
effectively sorting students. When it comes to evaluation efficiency, standardized test scores are 
overwhelmingly superior in identifying talent over other qualities that schools ought to place greater 
emphasis on, such as moral character, life adaptability, non-cognitive skills, and social 
responsibility. These preferences are rooted in the strengths of standardized tests, which are a 
product of historical and social conventions. There are considerable and obvious advantages to 
employing paper-and-pencil examinations that feature a series of archetypal questions, including 
practicality, reliability, good content validity, convenience, accessibility, and openness. 

Despite the usefulness of traditional tests in assessing students’ knowledge and skills, there are 
several other factors that can impact academic performance, often overlooked. One significant 
factor identified in predictive studies is socioeconomic status (SES), which plays a vital role in 
widening the academic performance gap between students in rural and urban institutions. In sub 
continent, high SES often correlates with above-average exam scores, highlighting the significant 
impact of SES on educational performance. Many researchers and educators continue to explore the 
effects of SES on academic performance using correlation and regression analysis. As such, this 
study aims to employ machine learning (ML) models, a novel approach in predictive studies, to 
investigate the impact of SES on student academic performance. Numerous studies have 
demonstrated the considerable accuracy of ML prediction compared to a classic statistical method 
such as correlation and linear. As an artificial intelligence approach, ML has had a far-reaching 
influence on handling the vast amounts of facts and numerical data generated by computers through 
simulations of the human brain. For instance, an ML algorithm is superior in analyzing considerable 
internet data than regular models, since it enables relatively rapid prediction with high accuracy and 
large datasets. Applying ML algorithms also enables researchers and teachers to recognize the key 
factors that strongly influence student performance and find more effective ways to improve 
teaching quality. The problem is that previous studies were a small-scale, incomprehensive and 
restricted data pool to address certain groups under limited conditions. This scope cannot ensure 
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overall effective outcomes of ML prediction and a large representative sample has yet to be used to 


further verify the precision of ML results. 
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Figure. The Diagram shows how ML used big data to predict academic performance. 


IIL.SYSTEM ANALYSIS 
Data Description 


Using population and school data, this study trained five ML models: a decision tree, random forest, 


logistic regression, support vector machine 


, and neural networks. 
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When we evaluated how the county population affects academic performance, we found the large 
county population (1M~1.58M, red box and red arrow in Figure) lead to a lower percentage of 
advanced students and a higher percentage of below-basic students. The larger county population 
suggested a lower academic performance. On the contrary, a smaller county population helped to 
improve students’ overall academic performance. 


HI.SYSTEM CONSTRUCTION 


Simulation results- Final Students Performance Model 
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We evaluated four ML prediction methods, including decision tree, random forest, support vector 
machine, and neural network. Among all methods, the decision tree, random forest, logistic 
regression, and support vector machine achieved testing accuracy of 48%, 54% and 51%, 
respectively (Table). The neural network achieved the highest 60% testing accuracy. As a result, 
this paper utilized the neural network method for the next step of the analysis. 

When we applied the ML models, the prediction versus reality results were shown in Figure 


Method Classifier Traning Testing 
Accuracy Accuracy 
Decision tree Decision Tree Classifier 94% 48% 
Random forest Random Forest Classifier 94% 54% 
Support vector machine | Support Vector Classifier 59% 51% 
Neural network MLP Classifier 61% 60% 
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Figure. Percentage of advanced students by ML models (Prediction versus reality). 


IV.CONCLUSION 

On the basis of big data this research demonstrated the feasibility of using ML models to predict 
class academic performance. To this we used an ML model that achieves fast and precise 
predictions. This study confirmed that ML models are accurate and effective instruments. With ML 
models as grounding, we found that well-educated people in small counties that have lower income 
could contribute to higher academic performance. Finally, SES exerts a significant impact on the 
rural—urban performance gap. The ML models are expected to provide assistance and guidance (e.g., 
decision making on issues that may affect performance, such as education budgets, hiring standards 
and practices, and teacher—student ratios) to education policymakers in the region in the future. 
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